66 research outputs found

    Contextual Bandits with Cross-learning

    Full text link
    In the classical contextual bandits problem, in each round tt, a learner observes some context cc, chooses some action aa to perform, and receives some reward ra,t(c)r_{a,t}(c). We consider the variant of this problem where in addition to receiving the reward ra,t(c)r_{a,t}(c), the learner also learns the values of ra,t(c)r_{a,t}(c') for all other contexts cc'; i.e., the rewards that would have been achieved by performing that action under different contexts. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions (in this setting the context is the decision maker's private valuation for each auction). We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve O~(CKT)\tilde{O}(\sqrt{CKT}) regret against all stationary policies, where CC is the number of contexts, KK the number of actions, and TT the number of rounds. We demonstrate algorithms for the contextual bandits problem with cross-learning that remove the dependence on CC and achieve regret O(KT)O(\sqrt{KT}) (when contexts are stochastic with known distribution), O~(K1/3T2/3)\tilde{O}(K^{1/3}T^{2/3}) (when contexts are stochastic with unknown distribution), and O~(KT)\tilde{O}(\sqrt{KT}) (when contexts are adversarial but rewards are stochastic).Comment: 48 pages, 5 figure

    Contextual Standard Auctions with Budgets: Revenue Equivalence and Efficiency Guarantees

    Full text link
    The internet advertising market is a multi-billion dollar industry, in which advertisers buy thousands of ad placements every day by repeatedly participating in auctions. In recent years, the industry has shifted to first-price auctions as the preferred paradigm for selling advertising slots. Another important and ubiquitous feature of these auctions is the presence of campaign budgets, which specify the maximum amount the advertisers are willing to pay over a specified time period. In this paper, we present a new model to study the equilibrium bidding strategies in standard auctions, a large class of auctions that includes first- and second-price auctions, for advertisers who satisfy budget constraints on average. Our model dispenses with the common, yet unrealistic assumption that advertisers' values are independent and instead assumes a contextual model in which advertisers determine their values using a common feature vector. We show the existence of a natural value-pacing-based Bayes-Nash equilibrium under very mild assumptions. Furthermore, we prove a revenue equivalence showing that all standard auctions yield the same revenue even in the presence of budget constraints. Leveraging this equivalence, we prove Price of Anarchy bounds for liquid welfare and structural properties of pacing-based equilibria that hold for all standard auctions. Our work takes an important step toward understanding the implications of the shift to first-price auctions in internet advertising markets

    Single-Leg Revenue Management with Advice

    Full text link
    Single-leg revenue management is a foundational problem of revenue management that has been particularly impactful in the airline and hotel industry: Given nn units of a resource, e.g. flight seats, and a stream of sequentially-arriving customers segmented by fares, what is the optimal online policy for allocating the resource. Previous work focused on designing algorithms when forecasts are available, which are not robust to inaccuracies in the forecast, or online algorithms with worst-case performance guarantees, which can be too conservative in practice. In this work, we look at the single-leg revenue management problem through the lens of the algorithms-with-advice framework, which attempts to harness the increasing prediction accuracy of machine learning methods by optimally incorporating advice about the future into online algorithms. In particular, we characterize the Pareto frontier that captures the tradeoff between consistency (performance when advice is accurate) and competitiveness (performance when advice is inaccurate) for every advice. Moreover, we provide an online algorithm that always achieves performance on this Pareto frontier. We also study the class of protection level policies, which is the most widely-deployed technique for single-leg revenue management: we provide an algorithm to incorporate advice into protection levels that optimally trades off consistency and competitiveness. Moreover, we empirically evaluate the performance of these algorithms on synthetic data. We find that our algorithm for protection level policies performs remarkably well on most instances, even if it is not guaranteed to be on the Pareto frontier in theory. Our results extend to other unit-cost online allocations problems such as the display advertising and the multiple secretary problem

    Online Resource Allocation under Horizon Uncertainty

    Full text link
    We study stochastic online resource allocation: a decision maker needs to allocate limited resources to stochastically-generated sequentially-arriving requests in order to maximize reward. At each time step, requests are drawn independently from a distribution that is unknown to the decision maker. Online resource allocation and its special cases have been studied extensively in the past, but prior results crucially and universally rely on the strong assumption that the total number of requests (the horizon) is known to the decision maker in advance. In many applications, such as revenue management and online advertising, the number of requests can vary widely because of fluctuations in demand or user traffic intensity. In this work, we develop online algorithms that are robust to horizon uncertainty. In sharp contrast to the known-horizon setting, no algorithm can achieve even a constant asymptotic competitive ratio that is independent of the horizon uncertainty. We introduce a novel generalization of dual mirror descent which allows the decision maker to specify a schedule of time-varying target consumption rates, and prove corresponding performance guarantees. We go on to give a fast algorithm for computing a schedule of target consumption rates that leads to near-optimal performance in the unknown-horizon setting. In particular, our competitive ratio attains the optimal rate of growth (up to logarithmic factors) as the horizon uncertainty grows large. Finally, we also provide a way to incorporate machine-learned predictions about the horizon which interpolates between the known and unknown horizon settings
    corecore